Radio Oranje: Enhanced Access to a Historical Spoken Word Collection
نویسندگان
چکیده
Access to historical audio collections is typically very restricted: content is often only available on physical (analog) media and the metadata is usually limited to keywords, giving access at the level of relatively large fragments, e.g., an entire tape. Many spoken word heritage collections are now being digitized, which allows the introduction of more advanced search technology. This paper presents an approach that supports online access and search for recordings of historical speeches. A demonstrator has been built, based on the so-called Radio Oranje collection, which contains radio speeches by the Dutch Queen Wilhelmina that were broadcast during World War II. The audio has been aligned with its original 1940s manual transcriptions to create a time-stamped index that enables the speeches to be searched at the word level. Results are presented together with related photos from an
منابع مشابه
Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval
This paper presents the application of document expansion using a side collection to a cross-language spoken document retrieval (CL-SDR) task to improve retrieval performance. Document expansion is applied to a series of EnglishMandarin CL-SDR experiments using selected retrieval models (probabilistic belief network, vector space model, and HMM-based retrieval model). English textual queries ar...
متن کاملSpeechfind: an experimental on-line spoken document retrieval system for historical audio archives
In this study, we present the SpeechFind system, an experimental on-line spoken document retrieval system for historical audio archives. As part of an on-going U.S. NSF Digital Library Initiative project, entitled the National Gallery of the Spoken Word (NGSW), SpeechFind is intended to serve as an audio index and search engine for spoken word collections spanning the 20th century with as much ...
متن کاملA robust fusion method for multilingual spoken document retrieval systems employing tiered resources
In this study, we present two novel fusion approaches to merge subword and word based retrieval methods within a multilingual spoken document retrieval (SDR) system. Considering the fact that more than 6000 languages are spoken in the world today, resources (e.g., text and audio data, pronunciation lexicon) needed to develop Automatic Speech Recognition (ASR) systems for such a range of languag...
متن کاملSemantic processing survey of spoken and written words in adolescents with cerebral palsy: Evidence from PALPA word-picture matching test
Objective: The present study aimed to assess and compare semantic processing of spoken and written words in adolescents with cerebral palsy and healthy adolescents. Method: The present study is quantitative in terms of type and experimental in terms of method. Examination Group consisted 30 adolescents with cerebral palsy aged 10 to 15 years were selected by convenience sampling method. All of ...
متن کاملWord Image Matching Using Dynamic Time Warping
Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on electronic media. Convenient access to a collection requires an index, which is manually created at great labour and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting...
متن کامل